Skip to content

Conversation

pmatczak2
Copy link

@pmatczak2 pmatczak2 commented Nov 18, 2024

Description

This pull request introduces enhancements to the fit_loop.py file within the PyTorch Lightning framework. The primary focus of the changes is to improve code documentation and clarify the logic surrounding data loading and management in the training loop.

Key Changes

  1. Enhanced Documentation:

    • Added detailed comments throughout the _FitLoop class to explain the purpose and functionality of key methods, particularly setup_data(), on_run_start(), and on_advance_start().
  2. Clarified Logic for Data Loading:

    • Explained the rationale behind multiple calls to setup_data(), emphasizing its role in ensuring that data loaders are fresh for each epoch and the conditions under which they are reloaded.
  3. Improved Readability:

    • General improvements to code readability by adding comments that provide context for the flow of the code and the design decisions made, making it easier for future developers to understand the implementation.
  4. Specific Method Highlights:

    • setup_data(): Documented its purpose in managing training data loaders and handling overfitting scenarios.
    • on_run_start(): Clarified its role in setting up validation data loaders and invoking relevant hooks.
    • on_advance_start(): Explained the necessity of calling setup_data() to prepare for the current epoch.

Impact

These changes aim to enhance the maintainability and clarity of the codebase, making it easier for contributors to understand the data loading logic and the overall training process within PyTorch Lightning.


📚 Documentation preview 📚: https://pytorch-lightning--20426.org.readthedocs.build/en/20426/

   - Added comments to clarify the purpose of methods in fit_loop.py.
   - Explained the logic behind multiple calls to setup_data().
   - Documented handling of validation checks and logging intervals.

   These changes aim to enhance code readability and maintainability for future developers.
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Nov 18, 2024
Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.

Project coverage is 81%. Comparing base (3627c5b) to head (6bd4bca).
Report is 73 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (3627c5b) and HEAD (6bd4bca). Click for more details.

HEAD has 553 uploads less than BASE
Flag BASE (3627c5b) HEAD (6bd4bca)
cpu 147 24
lightning 104 19
pytest 84 2
python3.9 45 6
gpu 4 2
lightning_fabric 25 0
python3.11 41 6
python3.10 45 3
python3.12 16 9
pytorch2.1 43 9
pytest-full 67 24
pytorch_lightning 22 7
pytorch2.2 8 0
pytorch2.3 12 3
pytorch2.4 4 0
Additional details and impacted files
@@            Coverage Diff            @@
##           master   #20426     +/-   ##
=========================================
- Coverage      89%      81%     -8%     
=========================================
  Files         267      264      -3     
  Lines       23065    23148     +83     
=========================================
- Hits        20574    18740   -1834     
- Misses       2491     4408   +1917     

Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmatczak2

It's in general a good thing to add comments to code, but comments need to add meaningful value for the reader in order to not be line-noise.

As in, clarify what happens in depth and not superficially, beyond what the following source code already describes. See comments.

else:
combined_loader = train_dataloader

# Handle overfitting scenarios
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. Overfit batches is a feature of Trainer to purposefully overfit to spot potential bugs in users code.

train_dataloader = _request_dataloader(source)
trainer.strategy.barrier("train_dataloader()")

# Initialize combined loader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be extended: from now on we assume we deal with a combined_loader, in case of a single dataloader we treat it as a combined_loader holding a single data loader.

if trainer.overfit_batches > 0:
_resolve_overfit_batches(combined_loader, mode=RunningStage.TRAINING)

# Process each data loader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the comment to be useful we need to expand what "Process" means

# Store epoch of dataloader reset for reload_dataloaders_every_n_epochs
self._last_train_dl_reload_epoch = trainer.current_epoch

# Validation check interval logic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment is only useful it if sheds more light on the underlying mechanisms.
Adding "Validation check interval logic" on top of if isinstance(trainer.val_check_interval, int) ... is pretty redundant.

What does this logic do exactly is what will make the comment useful for readers. Conversely it's line noise.

trainer.val_check_batch = int(self.max_batches * trainer.val_check_interval)
trainer.val_check_batch = max(1, trainer.val_check_batch)

# Warning for logging intervals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning what for what reason?

@lantiga lantiga added the waiting on author Waiting on user action, correction, or update label Nov 25, 2024
@lantiga
Copy link
Collaborator

lantiga commented Nov 25, 2024

Closing for now, feel free to reopen at any time

@lantiga lantiga closed this Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pl Generic label for PyTorch Lightning package waiting on author Waiting on user action, correction, or update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants